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Abstract 

Vector valued data appearing in concrete applications often possess sparse expan- 
sions with respect to a preassigned frame for each vector component individually. Ad- 
ditionally, different components may also exhibit common sparsity patterns. Recently, 
there were introduced sparsity measures that take into account such joint sparsity pat- 
terns, promoting coupling of non-vanishing components. These measures are typically 
constructed as weighted £i norms of componentwise £q norms of frame coefhcients. We 
show how to compute solutions of linear inverse problems with such joint sparsity reg- 
ularization constraints by fast thresholded Landweber algorithms. Next we discuss the 
adaptive choice of suitable weights appearing in the definition of sparsity measures. 
The weights are interpreted as indicators of the sparsity pattern and are iteratively up- 
dated after each new application of the thresholded Landweber algorithm. The resulting 
two-step algorithm is interpreted as a double-minimization scheme for a suitable target 
functional. We show its ^2-norm convergence. An implementable version of the algo- 
rithm is also formulated, and its norm convergence is proven. Numerical experiments 
in color image restoration are presented. 

AMS subject classification: 65J22, 65K10, 65T60, 90C25, 52A41, 49M30, 68U10 
Key Words: linear inverse problems, joint sparsity, thresholded Landweber iterations, 
curvelets, sub differential inclusion, color image reconstruction 

1 Introduction 

Inverse problems. We address the problem of recovering an element n of a Hilbert space 
/C from the observed datum g = Tu in the Hilbert space TC, where T : /C — > W is a bounded 
linear operator, possibly non-invertible or with unbounded inverse. A simple approach to 
this problem is to minimize the discrepancy 

T{u) := \\Tu-g\nf. 



^Johann Radon Institute for Computational and Applied Mathematics, Austrian Academy of Sciences, 
Altenbergerstrasse 69, A-4040 Linz, Austria, massimo .f ornasierOoeaw. ac . at 

MF acknowledges the financial support provided by the European Union's Human Potential Programme 
under contract MEIF-CT-2004-501018. He also thanks NuHAG for its warm hospitality. 

"'Numerical Harmonic Analysis Group, Faculty of Mathematics, University of Vienna, 
Nordbergstrasse 15, A-1090 Vienna, Austria, holger . rauhutOunivie . ac. at 

HR acknowledges the financial support provided by the European Union's Human Potential Programme 
under contracts HPRN-CT-2002-00285 (HASSIP) and MEIF-CT-2006-022811. 



1 



If ker(T) = {0} then there exists a unique solution given by u* = {T*T)-^T*g. However, if 
T has unbounded inverse, i.e., {T*T)^^ is unbounded then this approach is very unstable. 

Thus, if T is non-invertible or has unbounded inverse (or an inverse with high norm) 
one has to take into account further features of the expected solution. Indeed, a well-known 
way out is to consider the regularized problem [201 

ti* := argmin„g;^T(n) + a||n|/C|p. 

for which the corresponding solution operator : (7 i— > n* is bounded. Unfortunately, the 
minimal norm constraint is often not appropriate. A recent approach is to substitute this 
particular constraint with a more general one 

u%, := argmin„g^T('u) + 

where <I> is a suitable sparsity measure. 

Sparse frame expansions. A sparse representation of an element of a Hilbert space 
is a series expansion with respect to an orthonormal bases or a frame that has only a small 
number of large coefficients. Several types of signals appearing in nature admit sparse frame 
expansions and thus, sparsity is a realistic assumption for a very large class of problems. 
For instance, images are well-represented by sparse expansions with respect to wavelets or 
curvelets, while for audio signals a Gabor frame is a good choice. 

Sparsity has had already a long history of successes. The design of frames for sparse 
representations of digital signals has led to extremely efficient compression methods, such 
as JPEG2000 and MP3 33 . Successively a new generation of optimal numerical schemes 
has been developed for the computation of sparse solutions of differential and integral 
equations, exploiting adaptive and greedy strategies ^SEHIEIEI- The use of sparsity 
in inverse problems for data recovery has been the most recent step of this long career 
of "simplifying and understanding complexity" , with an enormous potential in applications 
HinilEllinilllllllEniESIinillSIini- Another field, which caught much attention recently, 
is the observation that it is possible to reconstruct sparse signals from vastly incomplete 
information [Tj El 1231 1321 136j . This line of research is called sparse recovery or compressed 
sensing. 

From sparsity to joint sparsity. Most of the contributions appearing in the literature 
are addressed to the recovery of sparse scalar functions. Multi-channel signals (i.e., vector 
valued functions) appearing in concrete applications may not only possess sparse frame 
expansions for each channel individually, but additionally the different channels can also 
exhibit common sparsity patterns. Recently, new sparsity measures have been introduced 
that promote such coupling of the non-vanishing components through different channels 
|H1 EHl Eni ■ These measures are typically constructed as weighted £1 norms of channel iq 
norms with g > 1. We will use this concept for the solution of vector valued inverse problems 
and combine it with another approach further promoting the coupling of sparsity patterns 
along channels. 

Our main results. We show how to compute solutions of linear inverse problems with 
joint sparsity regularization constraints by fast thresholded Landweber algorithms, simi- 
lar to those presented in jElESHSHl- We discuss the adaptive choice of suitable weights 
appearing in the definition of the sparsity measures. The weights are interpreted as indi- 
cators of the sparsity pattern and are iteratively up-dated after each new application of 
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the thresholded Landweber algorithm. The resulting two-step algorithm is interpreted as a 
double-minimization scheme for a suitable target functional. We prove that our algorithm 
converges to its minimizer. Since the functional is not smooth, this is done by subdifferen- 
tial inclusions [SZ]. We prove that the thresholded Landweber algorithm, which constitutes 
the inner iteration of the double-minimization algorithm, converges linearly. This feature 
was not ensured by the versions in [111 I39j . The second step of the double-minimization has 
actually a simple explicit solution. Finally, we show that the full exact double-minimization 
scheme converges linearly and we provide an implementable version which is also ensured 
to converge. 

Morphological analysis of signals and sparsity patterns. The use of sparseness 
measures not only allows to reconstruct a signal. At the same time it gives information 
about (joint) sparsity patterns which may encode morphological features of the signal. Well- 
known examples are the microlocal analysis properties of wavelets [301 1^ for singularity 
and regularity detection, and the characterization of edges and curves by curvelets for 
natural images [S]. For instance, the weight sequences appearing in the sparsity measures 
we define, and interpreted as indicators of the sparsity pattern, play a similar role as the 
discontinuity set is playing in the Mumford-Shah functional {34j . In fact, it is well-known 
that wavelet or curvelet coefficients have high absolute values at high scales as soon as we 
are in the neighborhood of discontinuities. Even more illuminating and suggestive is the 
parallel between the sparsity measure and its indicator weights with the Ambrosio-Tortorelli 
^ approximation of the Mumford-Shah functional. Here, the discontinuity set is indicated 
by an auxiliary function which is 1 where the image is smooth and where edges and 
discontinuities are detected. 

Joint sparsity patterns of vector valued (i.e., multi-channel) signals encode even finer 
properties of the morphology which do not belong only to one channel but are a common 
feature of all the channels. Here the parallel is with generalizations of the Mumford-Shah 
functional as appearing for example in |S] where polyconvex functions of gradients couple 
discontinuity sets through different color channels of images. 

Applications. We expect that our scheme can be applied in several different contexts. 
In this paper we limit ourselves to an application in color image reconstruction, modeling 
a real-world problem in art restoration. Indeed, color images have the advantage to be 
non-trivial multivariate and multi-channel signals, exhibiting a very rich morphology and 
structure. In particular, discontinuities (jump sets) may appear in all the channels at the 
same locations, which will be reflected in their curvelet representation (for instance). For 
these reasons, color images are a good model to test the effectiveness of our scheme promot- 
ing joint sparsity, also because the solution can be easily checked just by a visual analysis. 
Of course, the range of applicability of our approach is not limited to color image restora- 
tion. Neuroimaging (functional Magnetic Resonance Imaging, Magnetoencephalography) , 
distributed compressed sensing j^] and several other problems with coupled vector valued 
solutions are fields where we expect that our scheme can be fruitfully used. The numer- 
ical solution of differential and integral operator equations can also be addressed within 
this framework and we refer for example to |14| I38[ II 5j for implementations by adaptive 
strategies. 

Content of the paper. The paper is organized as follows. In Section 2 we introduce 
the mathematical setting. We formulate our model of joint sparsity for multi-channel signals 
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and the corresponding functional to be minimized in order to solve a given linear inverse 
problem. The functional depends on two variables. The first belongs to the space of signals 
to be reconstructed, the second belongs to the space of sparsity indicator weights. Convexity 
properties of the functional are discussed. Section 3 is dedicated to the formulation of 
the double-minimization algorithm and to its weak-convergence. The scheme is based on 
alternating minimizations in the first and in the second variable individually. In Section 4 
we discuss an efficient thresholded Landweber algorithm for the minimization with respect 
to the first variable. Its strong convergence is shown following the analysis in jJTj. The 
minimization with respect to the second variable has an explicit solution and no elaboration 
is needed. We provide an implementable version of the full scheme in Section 5. To prove its 
convergence we develop an error analysis. As a byproduct of the results in this section we 
show that the double-minimization scheme converges strongly. In Section 6 we present an 
application in color image reconstruction. Numerical experiments are shown and discussed. 



Nota on color pictures 

This paper introduces methods to recover colors in digital images. Therefore a gray level 
printout of the manuscript does not allow to appreciate fully the quality of the illustrated 
techniques. The authors recommend the interested reader to access the electronic version 
with color pictures which is available online. 



2 The Functional 
2.1 Notation 

Before starting our discussion let us briefly introduce some of the spaces we will use in the 
following. For some countable index set A we denote by ip = ip{A), 1 < p < oo, the space 
of real sequences u = {ux)x^a with norm 




1 < p < oo 



and ||u||oo := sup;^^^^ I ""A I as usual. If (vx) is a sequence of positive weights then we define 
the weighted spaces £p^y = ^p^„(A) = {u, {uxvx) G ^p(A)} with norm 

\\u\\p,v = \\u\ip,v\\ = ||(^iA^'A)||p = (X^^aI^aD 

\AeA 

(with obvious modification for p = oo). If the entries ux are actually vectors in a Banach 
space X with norm || • ||x then we denote 

ep,v{A,X) := {{ux)xeA,ux X,{\\ux\\x)xeA ^pA^)} 

with norm ||ti|^p,„(A, X)|| = ||(||uA||x)AeA|^p,t>(A)||. Usually X wiU be M*'^ endowed with 
the Euclidean norm, or the M-dimensional space , i.e., M.^^ endowed with the ^g-norm. 
By M-i- we denote the non-negative real numbers. 



i/p 
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2.2 Inverse Problems with joint sparsity constraints 

Let K, and TLj, j = 1, . . . , A^, be (separable) Hilbert spaces and A^j : K, Tij, j = 1, . . . , M, 
i = 1, . . . , N, some bounded linear operators. Assume we are given data gj G TCj, 

M 

9j = ^Aejfi, j = l,...,N. 
e=i 

Then our basic task consists in reconstructing the (unknown) elements fe £ IC, i = 1, . . . , M . 

In practice, it happens that the corresponding mapping from the vector (/^) to the 
vector (gj) is not invertible or ill-conditioned. Moreover, the data gj, j = 1,...,N, are 
often corrupted by noise. Thus, in order to deal with our reconstruction problem we have 
to regularize it. 

Our basic assumption throughout this paper will be that the 'channels' fe, i = 1, ■ ■ ■ , M, 
are correlated by means of joint sparsity patterns. Our aim is to model the joint sparsity 
within a regularization term. In the following we develop this idea. 

For the sake of short notation we resume the data vector into 

N 

g = {gj)j=i,...,M en:= 

where the Hilbert space Ti. is equipped with the usual inner product 9j,^j hj) := 
J2j{9j^ /ij) with gj, hj £ Tij. We also combine the operators Aij into one operator 

M / M \^ 

A:^)C^n, A{fe)fU = E^^.^^^ 

£=1 \£=1 / j=i 

In order to exploit sparsity ideas we assume that we have given a suitable frame {tpx : 
A G A} C /C indexed by a countable set A. This means that there exist constants Ci, C2 > 
such that 

CiWfWl < E < C2II/III for all / G /C. (1) 

aga 

Orthonormal bases are particular examples of frames. Frames allow for a (stable) series 
expansion of any / G /C of the form 

/ = Fn := J^^aV-a (2) 

AeA 

where u = {ux)x^a G ^2 (A). The linear operator F : £2 (A) — > /C is called the synthesis map 
in frame theory. It is bounded due to the frame inequality In contrast to orthonormal 
bases, the coefficients need not be unique, in general. For more information on frames 
we refer to 

A main assumption here is that the fg to be reconstructed are sparse with respect to the 
frame {ipx}- This means that fe can be well-approximated by a series of the form with 
only a small number of non- vanishing coefficients ux- This can be modelled by assuming 
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that the sequence u is contained in a (weighted) ^i(A)-space. Indeed, the minimization of 
the ^i(A) norm promotes that only few entries are non-zero. Taking for instance a wavelet 
frame and a suitable weight, the £i constraint implies that the element to be reconstructed 
lies in a certain Besov space -Bf ^, see [T7] . 

Analogously as in ^Jj such considerations lead to the regularized functional 

2 



J{u) = \\g -Tu\nf + \Wi,,{k"')\\ 



N 

E 



M 



9j 



i=i 



M 



U 



e=i AeA 



which has to be minimized with respect to the vector of coefficients u = 
norm in this functional clearly represents the regularization term. 



=1,. 



Al) 



(3) 



The 



{u 

The numbers v\, 
is determined we 
= Ea^aV'a- The 



A G A, are some suitable positive weights. Once the minimizer u 
obtain a reconstruction of the vectors of interest by means of fi = Fu^ 
algorithm in |17j can be taken to perform the minimization with respect to u. 

The functional J{u) in the form stated, however, does not necessarily model any cor- 
relation between the vectors ('channels') fn, i = 1,...,M. A way to incorporate such 
correlation is the assumption of joint sparsity, see also |29| I4()j. By this we mean that the 
pattern of non-zero coefficients representing fi is (approximately) the same for all the chan- 
nels. In other words, for some finite set of indexes Aq C A and for all i = 1, . . . , N there is 
an expansion 

AeAo 

In particular, the same Aq can be chosen for all f^s. 

We propose two approaches (that can be combined) to model joint sparsity. The first 
one assumes that the mixed norm 

AeA 



|n|4,.(A,^f) 



Fa| 



of u 



(u^) is small. Hereby, ux denotes the vector 
pM ^ _ 



A Je=i 



m 



(Recall also that 



pM 

denotes M^*'^ endowed with the £q-norm). Here, q > 1 and in particular, q = 2 oi q = oc, 
represent the interesting cases, since for q = 1 the above norm reduces to the usual weighted 
ii^y norm. In fact if q is large and some |n^| is large then the channel entries | are also 
allowed to be large for i' ^ £, without increasing significantly the norm ||wA|^f ||. The 
minimization of the above norm promotes that all entries of the 'interchannel' vector ux 
may become significant, once at least one of the components \u{\ is large. 



Introduce the operators Tij = AijF : £2 (A) 



M 



He and 



T:£2iA,R 



n, Tu 




N 



The above reasoning leads to the functional 

K{u) = K^'^\u) := \\Tu-g\n\ 



+ 



N 

E 



M 



|n|4,.(A,£, 
2 



f) 



(4) 



9j 



+ E^aI 

AeA 



^^aI 
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to be minimized with respect to u. In Section |3 we will develop an iterative thresholding 
algorithm similar as in IJTj to perform this minimization. 

The second approach to support joint sparsity is to encode the joint sparsity information 
in some sort of indicator function. This can in fact be done by using the weight {v\) as 
a second minimization variable. To this end we add an additional term to the original 
functional ©, punishing small values of vx. We obtain the functional 

Mu,v) := Jg]lQ{u,v) := \\Tu - g\n\\^ + '^vx\\ux\\i + '^9xipx - vxf 

AeA A 

restricted to vx > 0. Here, (^a)a and (pa)a are some suitable positive sequences. 

Now the task is to minimize Jo{u,v) jointly with respect to both u,v. (Again, once 
this minimizer is determined we obtain fi = Fu^). Analyzing Jq{u,v) we realize that for 
the minimizer {u,v) we will have vx = (or close to 0) if HuaHi = J2eLi I^aI large so 
that i'aII'WaIIi gets small. On the other hand, if \\ux\\i is small then the term 9x{px — vx) 
dominates and forces vx to be close to px- Thus, vx serves indeed as an indicator of large 
values of ||iiA||i. It has the effect, that if vx is chosen small due to one large then also the 
other coefficients u^^, i' £ can be chosen large without making the functional considerably 
bigger. 

Unfortunately, in contrast to the previous functionals, J(n, v) as stated above is no 
longer jointly convex in (n, v) in general (although it is convex as functional of u and of v 
alone). Thus, it cannot be ensured that a local minimum of the functional will be a global 
one, a property that is very crucial for an efficient minimization method. 

To overcome this problem we may add an additional suitable quadratic term. Moreover, 
we can, of course, combine the second approach with the first one and use an ^g-norm instead 
of an ^i-norm for the 'interchannel' vectors Ux- This leads to the most general form of the 
regularized functional considered in this paper, 

J{u,v) = Jg'^l^{u,v) := \\Tu-g\nf + '^vx\\ux\\g + '^uJx\\ux\\l + '^Ox{px-vx)'^. 

agA AeA AeA 

(5) 

Here, ux is a suitably chosen sequence of positive numbers, and 1 < g < oo. 

We will provide a sufficient condition depending on 9x and px in the next subsection 
ensuring the strict joint convexity of J{u,v) in {u,v). Although there is an extra term, 
J{u,v) has similar properties as Jo{u,v). In particular, v can still be seen as a sort of 
indicator function. 

Observe that in the minimum we will always have < vx < px- Therefore, we can assume 
the domain of J to be £2(A,M*^) x p-i(A)+ where i^^p-i{A)^ denotes the (convex) cone 
of all non-negative sequences (vx) € £^^p-i{A). 

Our main contribution consists in providing an algorithm for the minimization of J(n, v). 
It consists in alternately minimizing with respect to u and with respect v. The minimization 
with respect to v can be done explicitly. For the minimization with respect to u we propose 
an efficient iterative algorithm. 

We will mainly study the problem in the real- valued case. The complex- valued case can 
be treated with the same methods (in principle) by observing that C^^ is isomorphic to 
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SO passing from M complex-valued channels to 2M real-valued channels. We note, 
however, that slight complications may arise from the fact that an £q norm on is not 
isometric to an i'q-norm on M^^^ if q 2. (In particular, the thresholding operator on C'^ 
for q = oo will have a different form than the one provided in the next Section for the 
real-valued case). 

2.3 Convexity of the functional J 

At several places in the following it will be convenient to write 

Jiu,v) = r(M) + $(«)(«,?;) (6) 

where 

N M 

j=i e=i 

^(i){u,v) = ^vx\\ux\\g + ^ujx\\ux\\l + ^9x{p\-v\f 
AeA AeA AeA 

= ||(i;A||uA||,)AeA||i + ||t^lV^/2(A,^f)||' + ||;0-i;|V/2(A)f, 

are the discrepancy with respect to the data and the joint sparsity measure, respectively. 
Also it is useful to observe that decouples with respect to A, i.e., 

<^^i\u,v) = Y,^x\^x,vx) (7) 

AeA 

where 

$f (x,y) = y\\x\\g + iOx\\x\\l + dxiPx-yf (8) 

(M N.!/"? M 

^\xe\''\ +u;x^xj + 9x{px-yf, xeW^,y>0 
e=i J e=i 

(with the usual modification for q = oo). 

In the following we give necessary and sufficient conditions for the (strict) convexity 
of the functional ^^^^ for the most interesting cases q = l,2,oo. These imply sufficient 
conditions for the (strict) convexity of J = Jg^p^^- 

Proposition 2.1. Let q G {1,2, oo}. The sparsity measure ^^'^^ is convex if and only if 
^x^x ^ f fof 0,11 A G A, where k = M for q = 1, and k = 1 for q G {2, oo}. In particular, 
if uJxSx > f for all X £ A then J is convex. In case of a strict inequality ujxGx > j we can 
replace "convexity" by "strict convexity" in all of these statements. 

Proof. It is easy to see that is (strictly) convex if and only if all the A G A, are 
(strictly) convex. 
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Let us first consider q = I. Observe tliat we can write ^^^\x,y) = "^^Li ^x\xeTy) 
with 

Fi^\z,y) = y\z\+ujx\z\^ + M~'dx{px-y? (9) 
= {y\z\ + u;x\z\^ + M-^Oxy^) + {pi - 29xy) , z G M, y > 0. 

The function in the second bracket is obviously hnear, hence convex. The function in the 
first bracket can be written as the composition Gx o L with L{z,y) = {\z\,y) and 

Gx{z, y) = yz + uxz" + M-^xy^ = ^(z, y) H^'^ {z, yf, z,yeR, 



where 



^(1) 



2uJx 1 
1 lOxM-^ 



Since L is convex and has range M^, and Gx is monotonically increasing in each coordinate 
on it suffices to show that Gx{z,y) is convex if and only if Oxi^x ^ M/A, see e.g. |1J 
p. 86]. The convexity of Gx is equivalent to H^^^ being positive semidefinite. The latter 
is clearly equivalent to 9x^x ^ M/4. Strict convexity is equivalent to a strict inequality 
Oxiox > M/4. 

Now let q = 2. Observe that ^>f^ = ^oL^^) where L(2) : R^^ ^j^^ ^ ]^2^^ L('^\x,y) = 
{\\x\\2,y) and 

Fi^\z,y)=yz + u;xz^ + ex{px-y? = ^{z,y) H^^Hz,yf + Oxipl - 2pxy), z,yGR 

(10) 

with 



2ujx 1 
1 29x 



By a similar argument as above ^^^^ is convex if and only if H^"^^ is positive semi-definite. 
The latter is the case if and only if ujx(^x ^ 1 /4, and strict convexity is equivalent to a strict 
inequality. 

Finally, let g = oo. Observe that 



m=l 



Since ^^^^^ is the pointwise maximum of M functions, it is sufficient (see j4j p. 80]) to 
investigate the (strict) convexity of each of the functions 

M 



f\,e{x, y) = yxe + ^^a ^ {xef + 9x{px - yf 



^ {y, x) Hj,"^^ {y, xf + 9x{pl - 2ypx), x G R^' , y>0, 



2 
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with 



(oo) 



H. 



One can show by induction that 














^2/ 






5mi 



Si A 
29x1 



(40A^A-1). 



Thus, is positive semidefinite if and only if 6\ujx > 1 /4, and the convexity of is 

equivalent to the latter condition. Once again strict convexity is equivalent to the strict 
inequality. □ 

We do not pursue the task to obtain conditions for the convexity of ^^"^^ and J for 
general q ^ 1, 2, oo, but rather assume that and hence J are always convex also in this 
case. 



(g) 



3 The Minimizing Algorithm and its Convergence 

In this section we propose and analyze an algorithm for the computation of the minimizer 
{u*,v*) of the functional J{u,v) = Jg^li_j{u,v) defined in ©. The algorithm consists in 
alternating a minimization with respect to u and a minimization with respect to v. More 
formally, for some initial choice v^'^\ for example v^^^ = {px)xeAj we define 

:= argmin„g^2(^_iRM) J(u, 

:= argmin^g^^^^_^(A)^ J(n("),u). 

The minimization of J(n, f^"^^^) with respect to u can be done by means of the itera- 
tive thresholding algorithm that we will study in the next section. The minimizer v^"^^ 
of J{u^^^ , v) for fixed u^^^ can be computed explicitly. Indeed, it follows from elementary 
calculus that 

^(n) ^ 1 PA - 2fcll«^"^llg if WU^'^WU < 20XPX .-^2) 

\ otherwise . 

We have the following result about the convergence of the above algorithm. 

Theorem 3.1. Let I < q < oo and assume that <I>('^) and hence J are strictly convex (see 
also Proposition \2. Moreover, we assume that l2 ui^/'^i^-,'^^^) is embedded into i2{A,M.^'^ ) , 
i.e., ^ 7 > for all A G A. Then the sequence {u^"'\v^"'^)nen converges to the unique 
minimizer {u*,v*) G i2{A,M.^'^) x i^^p-i{A)-^- of J. The convergence of u^^^ is weak in 
^2(A,M^'^) and that of v^'^'^ holds componentwise. 

For the most interesting cases q G {1, 2, oo}, if in addition Ox^Jx > o" > 0^/4 for all A G A, 
where (/>i = M , 02 = 1; (poo = then the convergence of m^"^ to u* is also strong in 
^2(A,M^^) and w^") — v* converges to strongly in £2,0 (A). 

The rest of the section will be spent with the proof of the weak convergence of the 
algorithm. The strong convergence and the full proof of the Theorem K-{ . 1 1 will be established 
only in Subsection 5.3 later. 
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3.1 Subdifferential calculus 

A main tool in the analysis of non-smooth functionals and their minima is the concept 
of subdifferential. Recall that for a convex functional F on some Banach space V its 
subdifferential dF{x) at a point x £ V with F{x) < cxd is defined as the set 

dF{x) = {x* £ V*,x*{z - x) + F{x) < F{z) for all z £ V}, 

where V* denotes the dual space of V. It is obvious from this definition that G dF{x) if 
and only if x is a minimizer of F. In the following we investigate the subdifferential of J. 
In order to have J defined on the whole Banach space £2(A,M^) X £oo,p-i(A) rather than 
just for positive f^'s (which is needed to use subdifferentials) we simply extend J{u,v) by 

J{u, v) = CO if f A < for some A e A 

as usual. This extension preserves convexity and does not change the minimizer. 

Recall that J can be written as J{u,v) = T{u) + (^^'^\u,v), see ©. Since both T and 
are convex we have, see e.g. Proposition 5.6], 

dJ{u,v) = dT{u) X {0} + (13) 

Concerning the subdifferential of T we have the following result. 

Lemma 3.2. The subdifferential of T at u £ £2(A, M^^) consists of one element, 

dT{u) = {2T*{Tu-g)}. 

Proof. Since T is convex and Gateaux-differentiable, by Proposition 5.3 |24j we have dT(u) = 
{T'{u)} , where its Gateaux-derivative is characterized by {T'{u), z) = lim/j^o+ 
for all z £ £2(A,M*''^). It is straightforward to check that the Gateaux derivative of a 
functional of the type u — > ||Tn — 5|P (with linear T) at u applied on z is given by 
2{Tu — g, Tz) = 2{T*{Tu — g), z). This proves the claim. □ 

Let us now consider the subdifferential of d^^'^\u,v). Recall its domain ^2(A, M^^) x 
^oo p-i(A). Since the dual of ^oo,p-i is a bit inconvenient to handle we restrict the sub- 
differential to the predual ii^p. This will be enough for our purposes. Moreover, recall 
that decouples into a sum of functionals ^^^^ depending only {ux,vx), see Q. It is 
straightforward to show the following lemma. 

Lemma 3.3. The subdifferential of ^^'^^ at a point {u,v) £ ^2(A, M*^) x l^^p-i (A) with 
<I>('^)(u, u) < oo satisfies 

D^^'^\u,v) := d^^'i\u,v) n (^2(A,R^0 x ^i,p(A)) 

= {(^,77) £ ^2(A,M^0 X h,pW ■ (6,r/A) G d^^^\ux,vx) for all A £ A}. 

We are left with investigating the subdifferential of the functional ^^^^ defined in (|5|). 
Similarly as J we extend it to M*^ x R by $^^^(x, y) = oo for y < 0. 
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Lemma 3.4. Let I < q < oo. Assume that is convex (see also Provosition \2. 1\) . Then 
for {x,y) G x M+ we have 

d^['\x,y) = {(e,r/) GM*^ xE: ^ yd\\ ■ Ux) + 2u;xx, 7] G \\x\\gds+{y) + lOxiy - px)}. 

(14) 

where s^{y) := y for y > and s^{y) = oo for y < 0. In particular, ds~^{y) = {1} for 
y>0 and ds+{0) = {-oo,l]. 

Remark: We recall that the sub differential of the g-norm on M^^ is given as follows. If 
1 < g < oo then 



d\\ ■ Ux) = { 



^9(1) ifx = 0, 



otherwise, 



where B'^' (1) denotes the ball of radius 1 in the dual norm, i.e., in igi with 1/q + 1/q' = 1. 
If g = 1 then 



d\\-Ux) = {^GM*': ^iGd\-\ixi),i = l,...,M} (15) 

where d\ ■ \{z) = {sign(z)} if z / and d\ -1(0) = [-1, 1]. 
If g = oo then 

d\\ ■ Wooix) = i r/ ■ / X 1 ,1 ■ (16) 

[ conv|(sign(x£)e^ : \x£\ = \\x\\oo\ otherwise, 
where conv A denotes the convex hull of a set A and ei the £-th canonical unit vector in 



Proof. Recall that 



^^'\x,y) = s+{y)\\x\\g + u;x\\x\\l + ex{px-yf. 



Let 2/ > so that ^^^\x,y) is finite. The subdifferential d{^^^^)x{x,y) of ^^'^\x,y) consid- 
ered as a function of x alone (i.e. for fixed y) is clearly given by 

d{^^^\{x,y) = yd\\ ■ Ux) + 2ujxx (17) 

while keeping y fixed gives 

di'^['^Ux,y) = ds+iy)\\x\\g + 2exiy-px)- 

This shows the inclusion 'C' in (|14|) . Moreover, for all the points {x,y) E M*^ X M+ where 
is differentiable we even have equality in (|14j) since is convex and, thus, all the 
subdifferentials appearing consist of precisely one point, i.e., the usual gradient. 
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Let 1 < q < oo. Then for x ^ 0, y > the differentiabihty assumption is clearly 
satisfied. For the other = or y = we note that by convexity of we have (see 

[SZl Corollary 10.11]) 

d{<^^^\{x,y) = {^-.31] such that (^,77) G d^'f\x,y)} (18) 

and the corresponding relation for d{^^^^)y{x,y). Now, if y > then ^^^\x,y) is differ- 
entiable with respect to y and thus, t] in the right hand side of (fTH)) is unique, indeed 
V = Vo '§ij^\\^^y)- We conclude that for y > 

d{^^^){x,y) = {(e,%),e e d{<^^^\{x,y)} 

In particular this holds for x = 0, even for general 1 < g < 00. The same argument 
applies for the case y = and x ^ (and 1 < q < 00), which shows (fTl)) in these cases. 
Now let a; = and y = 0. Then the right hand side of (|14|) contains precisely one point, 
i.e., (^, ry) = {0, —20xpx). Since the subdifferential $^^^(0,0) contains at least one point by 
convexity, it must coincide with (^,r/) by the trivial inclusion 'c'. (It is easy to check also 
directly that (0, —29xp\) G <^^^^(0, 0)). Note that this argument applies also for q = l,oo. 

It remains to treat the cases g = 1, 00 with x ^ and arbitrary y > 0. Let us start with 
q = 1. In the proof of Proposition 12.11 it was noted that 

M 

a>«(x,y) = 5]F«(x,,y) 

with F^^^ -.MP^M defined in ©. The subdifferential of Fx can be obtained in the same 
way as above (expressing e.g. formally the modulus as a 2-norm on M^). For (z, y) G R x M_|_ 
this yields 

dFl^'\z,y) = {(r,7?): t e yd\ ■ \{z) + 2u;xz,v & \z\ds+ {y) + 2M~'exiy - px)}. 
By convexity we have 

M 

d^^^\x,y) = Y,{ieezi,v) ■■ {ze,v) G dF^'\xe,y)] 
i=i 

where denotes the ^-th unit vector in R^"^. By the explicit form of the subdifferential of 
the £i-norm (fT5|) this gives (|TH) for g = 1. 

Finally, let q = 00. Similarly as in the proof of Proposition 12. II we write 

^A°°^(a;,y) = ^j^^\Pd^^y) 

with 

Fe{x,y) = y\xe\+ujx\\x\\l+9x{p\-yf- 
If / then F^{x^y) is differentiable with respect to x and 

dFe{x,y) = = ysign{xe)ee + 2uJxx, y e ds+{y)\xi,\ +2ex{y - px)}, 
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where denotes the ^-th canonical unit vector in M^. This even holds for y = by an 
analogous argument as above, see (|TH|) . The subdifferential of ^^^\x,y) for x / is then 
given by (see e.g. jSZl Exercise 8.31]) 

d^^^\x,y) = con-v{dFg{x,y) : Fe{x,y) = max Fm{x,y)}. 

m=l,...,M 

Since x ^ we have 7^ if \xi\ = \\x\\oo and the latter is the case iff F£{x,y) = 
maxm Fm{x , y) . Thus, we obtain 

d'^t\x,y) 

= conv IJ {(C,-?/) : C = ysign{xi)ei + 2uJxx,r] e ds+{y)\xe\ + 29x{y - p\)} 

i:\xi\ = \\x\\oc 

= {(i^v) '■ C G conv{ysign(x£)e£, \xi\ = \\x\\oo},r] £ ||x||oo5s+(y)} + {2uJxx,2exiy - Pa)} • 

By the characterization of the subdifferential of the 00-norm in ()16() we obtain the claimed 
equality in (|14)) for q = 00 and x ^ 0. This finishes the proof. □ 

Combining the previous lemmas we obtain the following result. 

Proposition 3.5. Let 1 < q < 00. Assume that ^^''^ is convex and let {u,v) G £2(A,M'^) x 
£o^ p-i(A) such that ^^'^\u,v) < 00. Then we have 

D<^^''\u,v) = {{C,7]) £ £2{A,R^') X h^p{A),Cx e vxd\\ ■ Uux) + 2UJXUX, 

i]x e \\ux\\gds+{vx) + 29x{vx - px), A G A} 
C a$('')(u,^;) (19) 

and 

DJ{u,v) = dJ{u,v)n{i2{A,R^^) X ii^p{A)) = (2r*r(M -g), 0) + L>$J'^(n, u) C dJ{u,v). 
3.2 Weak convergence of the double-minimization 

Before we actually start proving the weak convergence of the algorithm in (|11|1 we recall 
the following definition ^7] . 

Definition 1. Let y be a topological space and A = (^„)n6N a sequence of subsets of V. 
The subset A C y is called the limit of the sequence A, and we write A = lim.„ An, if 

A = {a £ V : 3a„ G A„,a = lima„}. 

n 

The following observation will be useful for us, see e.g. [HTl Proposition 8.7]. 

Lemma 3.6. Assume that F is a convex function on and (x„) C M^^ a convergent 
sequence with limit x such that T{xn),F{x) < 00. Then the subdifferentials satisfy 

lim5r(x„) C dT{x). 

n 

In other words, the subdifferential dT of a convex function is an outer semicontinuous set- 
valued function. 
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In the following we agree on the convention that the upper index n at u^") G ^2(^,1 
always denotes the re-th iterate and u^"^ G denotes the (vector-valued) entry at A of 
the n-th iterate. In the following proof we will never refer to the i-th. component of the M- 
dimensional vector u^x^, so hopefully no confusion can arise. Also, we denote by {DJ{u, v))\ 
the restriction of DJ{u,v) C £2(^5 M*^) x £i p(A) to the index A. By the previous section it 
holds 

{DJ{u, v))x = {2T*T{u - g))x, 0) + d<^['\ux, vx). (20) 

Now the proof is developed as follows. First, we recall that {u*,v*) = argmin J(n, u) if 
and only if G dJ{u* ,v*). Next, we show that there exist weakly convergent subsequences 
of (u("\f(")) (again denoted by (u^^^f^"^)) which converge to {u^°°\v^°°^) and that 

G limDJ(n("),w(")) C dJiu'-°^\v'-^^). (21) 

n 

Due to the strict convexity of J we conclude that (u(°°),w(°°)) = {u*,v*). Now, let us detail 
the argument. 

By definition of n*^") and u we have 

J(^i("),^;(")) _J(U{« + 1), 

= J(n("),i;(")) - J('u("+i),t;(")) + J(u("+^\ w^")) - J(n("+^), ^;("+i)) > 0. 

Thus, {J{u^'^\v^"'^))n is a nonincreasing sequence, and since J > this implies that 
(J(n("),r;("))) n converges. Moreover, 



,{")||2 



AeA 



Therefore, («("■))„ is uniformly bounded in ^2 1^1/2 (A, M*^) and thus, there exists a subse- 
quence (ti^"''-))fc that converges to G £2,^^1/2 (A, M*'^) weakly in both i2^^i/2{A,R'^'^) and 
£2(A,M*^), due to our assumption i^a ^ 7 > for all A G A. For simplicity, let us denote 
again = u^"-\ 

First of all, observe that weak convergence implies componentwise convergence, so that 
^ ^A°°^ and [r*rn(")]A ^ [r*ru(°°)]A for ah A G A. By the explicit formula (HH) for 
v^^^ this implies that u*-"^ converges pointwise to the limit 

vt^ := limt;(") = I ' if ^("^^aII. < 20aPa, ^22) 

n [0 otherwise. 

By definition of u^"^^ in (|11|) we have G 5Ju(n, f^"^) (where dJu{u,v) denotes the subdif- 
ferential of J considered as a functional of u only). This means that 



G 



2r*(rn(") -g) + v';'~'>d\\ ■ ||g(M^"0 + 2ujxu';'' for all A G A, 
A 



see also Lemma 1,3. 21 and (|17j) . in other words 



2T*(rn(") - ff) + <"'^Cr + 2t^A<', (23) 

A 



(n-l)An) , n,.„„(«) 
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for a suitable cl"^ G d\\ ■ \\q{u^^^). Now, let (^("),ry(")) G L»J(n("), By definition of I?J 

and by (|23j) we have 



for a suitable G <9|| • )• Since converges it is possible to choose the sequence 

'A 



^(") such that lim„^oo ^i"^ = for all A G A. From ^ it is straightforward to check that 



0£ds+{v'^^^)\\J^'^\\g + 2ex{v[^'^ -px) for ah AG A, (24) 



^A ^II^A llg^^'^AV'^A 

and similarly 



OG9.+ (^i"))||4")||, + 20,(^;i")-p,) 

We can choose t]^'^^ = so that lim„ r/^"^ = for all A G A. Altogether we conclude that 
G limn{DJ(u^"'\v^"^))x for all A G A. By continuity of T and Lemma 13.51 we conclude 



G hm 

n L 



(2(r*r(^W-5))A,o) + aci>i''^K ^'^a 

C (2r*r(^x(°°) -5)a,0) +a<I>i^^(nf )) = DJ{u^^\v^^^)x 



for all A G A. It follows that G J(n(°°), C dJ{u^°°\v'^'^'>), the latter inclusion by 

Proposition (|3.5|) . Hence, by strict convexity (u*,v*) = {u^°°\v^°°^). With this we have 
shown the weak convergence of the sequence u*-"-* to u*. 

To establish the strong convergence we need to develop a more detailed analysis of the 
minimization of J with respect to u. Next section is devoted to this end, and it will allow 
us to use some further tools for the full proof of Theorem I.S. II in Subsection 5.3. 

4 An Iterative Thresholding Algorithm for the Minimization 
with Respect to u 

One step of the minimization algorithm in the previous section consists in minimizing 
J{u,v) = Jg^p i^{u,v) for some fixed v. Moreover, keeping v fixed is also interesting for 

its own - in particular, if one is interested in minimizing the functional K = K^'^ defined in 
(jSnj- Indeed, for a; = and p = v we have J^fl^iu, v) = kI?\u). As we will describe in the 
following this minimization task can be performed by a thresholded Landweber algorithm 
similar to the one analyzed by Daubechies et al. in |17j . 
With V fixed our task is equivalent to minimizing 

K{u) = := WTu-gfu + ^iu) (25) 

with respect to u G ^2(A,M^^) where 

:= ^liW := E^^ll^^ll'? + E'^^ll^^ll2- (26) 
AeA AeA 
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We assume that T is non-expansive, i.e., ||r|| < 1, which can always be achieved by rescaling. 
Also we suppose that K is strictly convex. This is ensured if e.g. the kernel of T is trivial 
or WA > for all A > 0. 

We define a surrogate functional by 

K^{u,a) := K{u) — \\Tu — Ta\\'^ + \\u — a\\2 = \\Tu — g\\y^ + ^{u) — WTu — TaWy^ + Wu — aW^- 

Since ||r|| < 1 also K'^ is convex, see jl7j for a rigorous argument. Now starting with some 
u^^^ £ i2iA,M^^) we define a sequence n*-™^ by 

n(™+i)=arg min K'(u,u^'^h 

The minimizer of K'^{u,a) (for fixed a) can be determined explicitly as follows. First, we 
claim that 

aigminK^ {u,a) = U^^[a + T* {g — To)), 

u 

where the "thresholding" operator C/iji is defined as 

C/^(u) := arg min \\u-z\\l+^{z). (27) 

Indeed, a direct calculation shows that 

K^{u, a) = \\Tu — g\\j^ — \\Tu — Ta||^ + ||m — a\\\ + ^{u) 

= \\{a + T*{g - Ta)) - u\\l + ^(n) - \\a + T*{g - Ta)\\l + \\g\\l^ - \\Ta\\l, + \\a\\l 

Since the last terms (after ^{u)) do not depend on u they can be discarded when minimizing 
with respect to u, and the above claim follows. (The same argument works also for general 
'sparseness measures' Thus, the iterative algorithm reads 

^(m+i) = f/^ (^(m) ^T*^g_T^{^)^y (28) 

In the following we give more details about f/ij, and analyze the convergence of this algo- 
rithm. 

4.1 The thresholding operator 

Let us derive more information about C/ij, for our specific ^ = ^v,^} in l|26j) . We have the 
following lemma. 

Lemma 4.1. Let 1 < q < oo. It holds 

{U^liu))^ := (f/^(,)(n))A = (l+u;A)-'S(?(nA), 

where 

Si'^\x) = arg min \\z - x\\l + v\\z\\q, x G M^. (29) 
Furthermore, si''^ is given by 

Si'^\x) = x-p4(x), (30) 
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where P^^2 denotes the orthogonal projection onto the norm ball of radius v/2 with respect to 
the dual norm of \\- \\q, i.e., the \\ ■ \\qi-norm with q' denoting the dual index, \/q + \/q' = 1. 
(The analogous result holds also if the norm \\ ■ \\q is replaced by an arbitrary norm on M}'^ ). 

Proof. For the minimizing problem defining Um = Uvlil decouples with respect to 

A G A. Thus, we have 

iU^'!l(.x))x = arg min ||xa - z\\l + uJx\\z\\l + WAllzHg. 



If z minimizes the latter term then necessarily G 2(1 + ujx)z — 2x + vxd\\ ■ \\q{z) where 
d\\ ■ \\q denotes the subdifferential of the (7-norm. In other words, 

il + u;x)z-xG-^d\\-Uz). 

Since || • is 1-homogeneous we have d\\ ■ \\q{z) = d\\ ■ ||q((l +ujx)z). Setting y = (1 +ujx)z 
gives y — X £ —^d\\ ■ \\q{y), which is the above relation for ujx = 0. From this we deduce 
the first claim. 

Let us show the second claim, i.e., the explicit form of the operator si'^\ We already 
know that if z minimizes the left hand side of (|29)) then x — 2;G(9|||z||g. Let ip{z) = ^\\z\\q 
and V'* be its Fenchel conjugate function defined by = sup^((x,y) — f{x)). It is 

well-known [U p. 93] that 



if \\y\\q' < v/2 
00 otherwise 



Here B'^' {v/2) denotes the norm ball of radius v/2 with respect to the dual norm of || • \\q. 
It is a standard result, see e.g. |32l Proposition 11.3], [HJ Corollary 5.2], that w G dil){y) if 
and only if y G dtp*{w) yielding z G d'4j*{x — z) in our case, and hence, 

X e X - z + dip*{x - z) = X - Z + dXBq'{v/2)(^ ~ 

Now if y G w + dxgq' i2){'^) then it is straightforward to see that w must be the orthogonal 

projection of y onto B'^' {v/2), i.e., w = argmin^,g^,/^^y2) 11^' ~ y\\2, see also jSH Example 

10.2 and p. 20]. For our situation this means that x — z = P!/,i2{^)i i-^-; z = x — P^^^i^)- 
This shows the second claim. 

Clearly, all arguments work also for a general norm rather than the g-norm. □ 

Let us give si"^^ explicitly for g = 1, 2, 00. 
Lemma 4.2. Let x G and ?; > 0. 

(a) For q = 1 we have si^\x) = {s^\xe))f^-^ where for y G M 

sW{y) = I ° ^^1^1^^' 
1 sign(y)(|y| — |) otherwise. 
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(b) For q = 2 it holds 



i/||x||2<i, 
(11x112 ^/^) J. otherwise. 



lFl|2 



(c) Let q = oo. Order the entries of x by magnitude such that > {xi^l > . . . > |x 

1. If \\x\\i < v/2 then slr\x) = 0. 

2. If \\x\\i > v/2, let n £ {1, . . . ,M} be the largest index satisfying 

1 ' ■ ■ ^ 



Then 



s— ^X.KI-5j. (31) 



{SirHx))i^ =Xi^, j = n + l,...,M. 
Proof, (b) The projection -Pj/2(^) "-"^ onto an £2 ball of radius t;/2 is clearly given by 

'^/^ I otherwise . 

Since by the previous lemma si^\x) = x — P'^i2{x) this gives the assertion. 

(a) Although this is well-known we give a simple argument. For q = 1 the functional in 
(|29|) defining S^^^ decouples, i.e., 

M 

S'i^^(x) = arg min ^ - x<?p + v\zi\) . 



Thus, S^\x)(, = argmin^^gK — x^p + v\xi\ for all £ = 1,...,M. The latter can be 
interpreted as the problem for g = 2 on and hence, the assertion follows from (b). 

(c) If ||x||i < u/2 then -P}/2(^) ~ ^ previous lemma s'^\x) = x — -P}/2(^) ~ 

0. Now assume ||x||i > v/2. Let z = s'^\x). This is equivalent to being contained in 
the sub differential of the functional in (|29|) defining S-//^^ . This means 

2{z-x)£-vd\\-\\Uz)- (32) 



We recall that the subdifferential of the maximum norm is given by 

Now assume for the moment that the maximum norm of z is attained in Zj^ , . . . , Zi^ . 
We will later check whether this was really the case. Further, we assume for simplicity 
that all the entries Xj^ , . . . , Xj^ are positive. (The other cases can be carried through in the 
same way). Then certainly also the numbers , . . . , Zi^ are positive because choosing them 
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with the opposite sign would certainly increase the functional defining Sv°^\ Then by 1)16(1 
we obtain 2{zi. — Xi-) = for the entries Zi. not giving the maximum, i.e., Zi- = xi-^j = 
n+l,...,M. 

Moreover, if n = 1 (i.e., the maximum norm of z is attained at only one entry) then 
2{zi^ — Xi^) = —V, in other words, Zi^ = xi^ — v/2. Thus, the initial hypothesis that the 
maximum norm of z is attained only at Zj^ is true if and only if the second largest entry Xi^ 
satisfies Ixjjl < \zi^ \ — v/2. 

So if the latter inequality is not satisfied then the maximum norm of z is at least attained 
at two entries, i.e., n >2. In this case by ()l(i|) the entries = = • • • = 2i„ = t satisfy 



2t - 2xi^ = -vaj, j = 1, . . . , n - 1, 
2t - 2xi„ = -v(^- 

for some numbers ai, . . . S [0, 1] satisfying Ylij'^i — 1- This is a system of n linear 
equations in t and oi, . . . , a^-i- Writing it in matrix form we get 



/I v/2 
1 





v/2 








\1 -v/2 -v/2 -v/2 



\ 



V2/ 



ai 



( ^ii \ 



\an-ij \xi„ - v/2j 



Denoting the matrix on the left hand side by i?, a simple computation verifies that 



/ 1 

2(n-l) 



B- 



1 

n 



1 

_ 2 

V 

2(ra-l) 

V 



1 



1 \ 

2 



\ V 



2 2(n-l) 



V/ 



This gives 



and aj = — [v/2 + (n - \)xi^ - Efce{i,..., 
i G {l,...,n- 1} 

1 



n}\{j} 



Thus, all a,- are non-negative if for all 



Xi. > 

' - n-l 



A:e{l,...,n}\{i} 



Moreover, a simple calculation gives X]"=i aj = + ^ (Z]j=i ^ij - {n - • Thus, 

it holds 1 — X]j=i ^ if and only if 



Xi > 



n 
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Therefore, the initial assumption that the maximum norm of u is attained precisely at 
Zi-^, . . . , Zi^ can only be true if , . . . , Xi^ are the largest entries of the vector x and 



i.e., < n \xi^ \ — v/2). Pasting all the pieces together shows the assertion of 

the lemma. □ 

4.2 Weak convergence 

In the following we will prove that u^'") converges weakly and strongly to the unique min- 
imizer of K. We first establish the weak convergence. Following the proof of Proposition 
3.11 in jl7j one may extract essentially three conditions on a general sparsity measure ^ 
such that weak convergence is ensured. Let us collect them in the following Proposition. 

Proposition 4.3. Assume K is given by i25\) with a general sparsity measure ^ and suppose 
K is strictly convex. Let f/iji be the associated 'thresholding operator' given by \21\) . Assume 
that the following conditions hold 

(1) Uqj is non- expansive, i.e. \\U^{x) — Uqj{y)\\2 < — y||2 /or a// G ^2(A,M^^). 

(2) It holds II/II2 ^ ^(^(/)) fof o-ll f G -^2(^,1^*^) and some monotonically increasing 
function H on M-|_. (This ensures that a sequence fn satisfying ^{fn) < C is bounded 
in £2(A,K^0;. 

(3) For all x,he £2(A, it f^olds 

^(C/>t(x) + h)- ^(C/^(x)) + 2{h, U^{x) -x)>0. 

Then the sequence u^"^^ defined by \2^) converges weakly to the minimizer of K indepen- 
dently of the choice of u^^^ . 

Proof. First we claim that the condition in (1) implies that the surrogate functional K'^ 
satisfies 

K'{u + h,a)-K'{u,a) >\\h\\l (33) 

for u = argmin„' K^{u' ,a) = Uys/{a — T*{g — Ta)). Indeed, set x := a — T*[g — Ta), i.e., 
u = Uqj{x). Then an elementary calculation yields 

K'{u + h,a)- K\u) = \\T{u + h) - g\\l + -^{u + h) - \\T{u + h) - Ta\\l + \\u + h- a\\l 

— \\Tu — g\\\ — ^{u) + \\Tu — Ta\\\ — \\u — a\{^ 
= 2(/i, u-a-T*{g- Ta)) + ^'(n + h) - ^'(n) + \\h\\l 
= 2{h,Um{x)-x) + ■^{U^^{x) + h)-■^{U^,{x)) + \\h\\l > \\h\\l 

The relation in (1) was used in the last inequality. 

Now with H33() and properties (2) and (3) one can easily justify that the proofs of the 
analogues of Theorem 3.2 until Proposition 3.11 in ^Jj go through completely in the same 
way, which finally leads to the statement of this proposition. □ 
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Let us now show that for our specific choice of ^' = ^'i'^i properties (1) - (3) in the 
previous Proposition hold, and thus, n^"^ converges weakly to a minimizer of K. 

Lemma 4.4. C/i^2 = U (,) is non- expansive. 

Proof. Clearly, the map x i-^ {\ + uj\)~^x is non-expansive. By pOf) we have slf^ = I — P^,^. 



v/2- 

Since Pj^g is an orthogonal projection onto a convex set also slf^ is non-expansive, see e.g. 



[HHl. Hence, C/i^ 

is non-expansive since on each component x\, A E A, it is a composition 
of non-expansive operators. □ 

Lemma 4.5. If {v\) or (wa) ore hounded away from then condition (2) in Proposition 
I holds. 



Proof. This follows by a standard argument. □ 

If we consider the problem of minimizing J(u, v) jointly over u and v then we certainly 
cannot assume that v is bounded away from 0, but in this case we require that oj\ is bounded 
away from 0. (By Proposition 12.11 this is needed anyway to ensure that J{u,v) is jointly 
convex in u and v). In the case where we only minimize J(n, v) with respect to u (i.e., when 
minimizing ^{u) defined in (|2())l ) we may take u)\ arbitrary (and even ux = 0) but then we 
have to require a lower bound on v\. 

Now consider the third condition in the Proposition. The next lemma shows that it 
suffices to prove it for s'^\ i.e., for ojx = 0. 

Lemma 4.6. Assume that for all x, /i G M^'^ it holds 

^(||S(^)(x) + h\\, - \\Si^\x)\\,) + 2{h, Si'^Hx) - x) > 0. 
Then condition (3) in Proposition \4.3[ is satisfied. 

Proof. By definition of ^'I'^i we need to show that for a;, u > and all x,h E R*^ 

vm+u)-'S^^\x) + h\\,- \\{l+u:)-'si^\x)\U) 

+um+u)-'s^'i\x) + h\\i - 11(1 + u:r^si^\x)\\i) + 2{h, (1 + u^r'si^Hx) -x)>o. 

Setting h' = (1 + uj)h we obtain for the left hand side of this inequality 

{l + u;r'v{\\Si^\x) + h%-\\Si'^\x)\u) 
+(1 + ur'oj (||5(^)(x) + h'Wl - ||5('')(x)||i) + 2(1 + uy^h', 5(^)(x) - x) 
=(1 + [v{\\Si^\x) + h% - \\Si^Hx)\U) + 2{h', 5(^)(x) - x) 
+(1 + [WSi'^Hx) + h'Wl - \\Si'^Hx)\\l - 2{h',Si'^\x))] > il+u;)-^cu\\h'\\l > 0. 

This completes the proof. □ 
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Lemma 4.7. The condition in the previous lemma holds for si'^\ 1 < g < oo (and even if 
the £q norm is replaced by a general norm on ]R*^J. 

Proof. First note that by definition (|29)) and duahty we have 

\\Si'^\x)\U = iv/2r' sup {k,x-P^'^x) 

keBi' iv/2) 

A characterization of the orthogonal projection tehs us that {k — P^^2-^^ ~ — ^ 

all k G B'^'{v/2), see e.g. PHI Lemma 8]. This gives 

\\Si'^Hx)\U = {v/2)-' sup ((A;-p4x,x-p4(x)) + (p4(x),4^)(x)) 

keBi'{v/2) ^ 

< (^/2)-1(p4(x),4^)(x)). 
Using once more that si'^\x) — x = —P^/2(^) further obtain 

villSi^Hx) + h\U - \\S,{q){x)\U) + 2{h, Si''\x) - x) 
>v\\S^^\x) + h\U- 2(p4(x),5('^)(x)) - 2(p4(x),/i) = v\\Si^\x) + h\\, - 2{P^^^^{x), Si^\x) + h) 

>H|5('?)(x) + /i||,-2||Pj;2x||,ni4')(x) + /i||, > 0. 

Hereby, we used that Pj^g is a projection onto B^' {v/2), so ||P'J/2^II'?' — '"/^' This finishes 
the proof. □ 

To summarize we have the following result about weak convergence. 

Corollary 4.8. Let 1 < q < 00 and assume that {v\) or {iO\) is bounded from below. Then 
the sequence u^™-* defined in (|28j) converges weakly to a minimizer of K, where ^ = ^I'ji 
is the sparsity measure defined in ()26|) . (The g-norm in ()26() can be replaced by any other 
norm on M*^). 



4.3 Strong convergence 

The next result establishes the strong convergence. 

Proposition 4.9. Let 1 < q < 00 and assume that {v\) or {i^x) are hounded away from 0. 
In case {v\) is not hounded away from assume further that there is a constant c > such 
that v\ < c for only finitely many A. Then u^"^^ converges strongly to a minimizer of K. 

Proof. The analogues of Lemmas 3.15 and 3.17 in [TJj are proven in completely the same 
way. It remains to justify the analogue of jl7l Lemma 3.18]: If for some a G £2(A,M^^) 
and some sequence converging weekly to it holds lini^T^^oo 

/jM) _ ui'^l{a) - /i("')||2 = then \\h^"'^\2 ^ for m ^ oo. To this end we mainly follow 
the argument in \17]. 

Let c be the constant such that vx < c for A E Aqo for Aqo finite. Then let Aqi be a finite 
set such that ^;^gA\Aoi II^aII^' < <7 for some a < c/2. (Such a set Aqi exists since || • \\g/ and 
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II • ||2 are equivalent norms on and by assumption a G £2 (A, R-^)). Since Aq = Aqo U Aqi 
is also finite, we have X/A€Ao 

||/tf^||2^0for m — > 00 by the weak convergence of ^("^) to 

0. Thus, we are left with proving that II ^a"^ II 2 — for m — 00. 

For each m we split Ai := A \ Aq into the subsets Ai^^ := {A G Ai : H/i^"*^ + a\\\qi < 

vx/2} and Ai,^ = Ai \ Ai,^. If A e Ai then C/if2(a + h^'^^)x = u'fi{a)x = since 

llflA + /if ^ 11,', llaAll,' < vx/2. Thus, ||/if ) - C/g(a + h^^))x + C/if2(a)A||i = ||/if ^ ||i and by 
assumption, 

E ll^f^lli<ll^^"^^-^S(« + ^^'"^) + ^S(«)ll2-0 asm ^00. 
AeAi 

Now let A G Ai^^. We first consider the case that u\ = 0, i.e., Uv^^q{x)\ = si'^\x)x. Since 
lloAllg' < < vx/2 we have si'^\a) = 0, and thus, 

II^M _ ^(,)(^M ^ ^^)||^, ^ ii^M _ ^ ^ P^;/,(4-) + aA)||,. 

= ll^';^/2(^5r^ + «a) - flAllg' > \\P^l/2{h^r^ + «a)||9' - llaAllg' > vx/2 - a > c/2 - a. 

Hereby, we used that ||-PJ^/2(^a"'' + ^A)||g' = vx/2 (because ||/if ^ + OAllg' > vx/2). Since 
every norm on a finite-dimensional space is equivalent there is a constant C such that 

||/.Sr ) - ) + ax) + Si^){ax)\\l > C\c/2 - af > 0. 

However, since by assumption '^xeA II^^^a"^ ~ '^^?(/*a™^ '^>^) + 'S'^a''('^)II2 ^ as m — > 00 
there must exist an mo such that Ai^m is empty for all m > mo- 
In the case that ux does not vanish we have 

||/.f)-C/(f)(/.M+a), + C/(f)(a),||2 = ||/.f ^ - (1 + c.A)-^5(?(/if ^ + aA)||2 

= (1 + u;,)-i||(l ) - (/if ^ + a,)||2 

(34) 

We claim that 

11(1 + u^xWr^ - Sifih^r^ + ax)h > \\h^r^ - Si^ih^r^ + ax)h (35) 

so that we can apply the argument for w;^ = to conclude that Ai^^ is empty for m 
sufficiently large. Let us omit for the moment all indexes A and m for the sake of simpler 
notation. We have 

\\{l+Lo)h- Sii'^ih + a)\\l-\\h- Sii\h + a)\\l = 2ij{h,h- Si'i\h + a)) +Lo^h\\l (36) 
and furthermore, 

{h,h-Si'i\h + a)) = {h,P;'/^{h + a)-a) 

= -^h + a- P^'^^ih + a),a- P^'^^ih + a)) + ||a - P^'/^{h + a)\\l > 0. 

(37) 
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Hereby, we used that a £ B'^' (a) C Bi'{v/2) and the fact that {k- pf^^{x),x- P^'^^{x)) < 

for all k G B'i'{v/2) and x G M^^. Thus, the term in (jSHJ is non-negative and therefore our 
claim holds. □ 

Let us shortly comment on the condition that if vx is not bounded from below there is at 
least some c > such that vx > c except for a finite set of indexes A. This condition is mainly 
relevant when considering also a minimization over (vx)- Then the term "^OxiPx ~vxf' in 
the functional J{u,v) ensures that the sequence {px — vx) is contained in ^2 01/2 • If 9x and 
px are bounded from below this implies that vx can be less than l/2m.mx px, say, only for 
finitely many A. 

5 Numerical Implementation and Error Analysis 

The scope of this section is twofold: We want to formulate an implementable version of the 
double-minimization algorithm and show its strong convergence. To this end we develop an 
error analysis. 

5.1 Numerical implementation 

Let us compose the two iterative algorithms described in and (|28|) . respectively, into a 
unique scheme. 

Algorithm 1. JOINTSPARSE 

Input: Data vector {gj)jLi, initial points u'^^'^ G £2{A.,'K^'^), v^^^ with < v^^^ < px, 

number rimax of outer iterations, 

number of inner iterations Ln, n = 1, . . . , nmax- 
Parameters: q G [1,cxd], positive weights (Ox), (px), (i^a) with u>x > c > 0, 

such that ^^'^^ and hence J are convex, see Proposition \2. 1\ 
Output: Approximation {u*,v*) of the minimizer of jj^^^^ 

„{0,0) .^^(0). 

for n := to TT-max 

do 

for m := to L„ do 
endf or 

^{n+1,0) ._ ^(n,L„). 

0, otherwise . 



endf or 

;= ^('^maxiirimax) ■ 



AeA 



Observe that each (inner) iteration of the above algorithm involves an application of 
T*T and of the thresholding operator The latter can be applied fast. So if there is 

also a fast algorithm for the computation of T*T then each iteration can be done fast. 
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Our analysis ensures the (weak) convergence of this scheme only if the inner loop com- 
putes exactly the minimizer of J{u,v^'"^) for fixed v^"'\ i.e., if L„ = oo. Of course, this 
cannot be numerically realized, so we need to analyze what happens if the inner loop makes 
a small error in computing this minimizer. In other words, how large do we have to choose 
n-max and Ln,n = 1, . . . , rimax in order to ensure that we have approximately computed the 
minimizer u*,v* within a given error tolerance? 



5.2 Error analysis and strong convergence of JOINTSPARSE 

First of all we want to establish the convergence rate of the inner loop, i.e., the iterative 
thresholding algorithm of the previous Section. 

Proposition 5.1. Assume that cox > j > for all X £ A (implying that K{u) = k\j'^{u) 
is strictly convex) and \\T\\ < 1. Set a := (1 + 7)~-'^||/ — T*T\\ < 1. Then the iterative 
thresholding algorithm 

u(^,oo) _ ^{n,m+i) < a||u("'~) - II2. (38) 



u 



converges linearly 
Proof. Note that 



^{n,oo) ._ f^fa)^ ^ J^^(n,oo) + T* {g - Tu^^'^^))) . 

By non-expansiveness of si"^^ (see Lemma 14.41 and its proof) we obtain 

||^(n,oo) _ ^(n,m+l) ||^ 

= llf^S),. (^^"'°°^ +T*(9- rn("'-))) - (u^'^'-) + T*{g - Tn^"'™))' 

= ( ^(1 + a.A)-'||4?((n("'°°) + T*{g - rn("'-))A) - 5^^ ((n("'-) + T*{g - Tu^'''"'^)x)\\V\ 
VAeA / 

< sup(i + cjA)"^||(/-r*r)(ii('^'°°) -k("'™))||2 < (i + 7)-i||/-r*r|| ||«(".°°) -?i("."^)||2 

AeA 

= q||'u("'°°) -^("''"^lla. 

This establishes the claim. □ 

Remark: Clearly, the error estimation in (|.S8)) holds also if one is only interested in analyzing 
the iterative thresholding algorithm from the last section (i.e. without doing the outer 
iteration). Then it might also be interesting to consider the case that uj = 0. According 
to what we have proven in the previous section the algorithm still converges provided the 
weight V is bounded away from zero. However, then the error estimation (|5.1|) has a useful 
meaning only if a = ||/ — T*T|| < 1. So this applies if T*T is boundedly invertible. For 
a usual inverse problem, however, we will have a non-invertible T or at least one with 
unbounded inverse resulting in ||/ — r*T|| = 1. So in this case we only know that the 
algorithm converges, but an error estimate does not seem to be available. 

For simplicity we restrict the following error analysis to the most interesting cases q S 
{1,2, 00}. We first need the following technical result. 



1/2 
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Lemma 5.2. For q £ {1,2, 00} the projection P{j onto the ball B'^(v) C M is a Lipschitz 
function with respect to v £ M+. In particular, we have 

\\P^{x)- Pi,{x)\if\\<L\v-w\ forallxeR^, (39) 

where L = 1 for q = 2 and L = M^^"^ for q £ {1, cxd}. 

Proof. Let us start with q = 2. By distinguishing cases it is not difficult to show that 

\\p^ix)-p^{x)\el'\\<\v-w\. 

For q = 00 we have P^{x) = (p2°(^^))fc=i where for y G M 



y if \y\ < V, 

y — sign(y)(|y| — v) otherwise. 



Since can be interpreted as a projection onto the £2 ball in dimension 1, we obtain that 

\pT{y)-p^{y)\<\v-wl 

and 

||P-(x) -P-(x)Kril = [Y.\pTixi) -P^ixi)n < M'/'\v-w\. 

The case q = 1 requires a bit more effort. By Lemma l4.2l (c) we have the following. Let Xj^. 
denote the reordering of the entries of x by magnitude as in Lemma [4.2l Let n G {1, . . . , M} 
be the largest index satisfying 



\Xi > — 
n 



Then 



n 



[Pv {x)h, = I 2^ I - t; I , J = l,...,n, 

{P^{x))i^ =0, j = n + l,...,M. 

Observe first that for all x G M*^ there exists Eq > such that for all < e < eo the same 
n G {1, . . . , M} is the largest index satisfying 

n — 1 \ 



\xij > -{v + e) 

\k=l 



For < e < Eq, a simple computation yields 



for j = 1, . . . ,n, 
j = n + l,...,M. 
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This means that the map v — > Py{x) is right-difFerentiable, i.e., the hmit 

[P^ {x))+ = hm 

exists in IR*^. Moreover, it also fohows that 

||(P,i(x))V||2 = ^<^. (40) 
To conclude the proof we use the following standard result. 

Lemma 5.3. Let / : M — > M*^ and (/? : M ^ M 5e two continuous and right differentiate 
functions such that 

\\f'4v)\\<^'+iv), 

for all t> G M. Then 

\\f{v) - f(w)\\ < ip(v) - ip{w), for all v>w. 

According to the notation of this latter lemma, let us set f{v) = P^{x) and ip{v) = 
M^/^w. Since -Pj(x) is a continuous function with respect to v (in fact this is true for any 
projection onto convex sets, see |SZ1)) by (jifl|) and an application of the lemma we conclude 
that ||Pj(x) - Pl{x)\\ < M^/^\v - w\. □ 

Observe that the strict convexity of ^^'^\u,v) is equivalent to 9xujx > k/4, see Proposi- 
tion [23 In the following Proposition we require the slightly stronger condition that 9xuj\ 
is bounded strictly away from k/4, at least for q = 1,2. 

Proposition 5.4. Let q £ {l,2,c«}. Assume that Oxi^x > c > 0g/4 for all A G A, where 
(f)i = M, (f)2 = 1; <PoD = "'/M, implying that ^^'^\u,v) and J{u,v) are strictly convex, see 
Proposition ]^. li Moreover, let us assume that ^ 7 > for all A G A. Suppose \\T\\ < 1 
resulting in \\L — T*T\\ < 1. Set 

AgA 46'au;a + 46'a(1- ||/-r*r||) 4a 
Then for each n G N one has the following error estimate 

II^^Koo) _^i*|£2(A,M^-^)|| < -u*|^2(A,M^'0||. 
Proof. Let us consider the n-th iteration of the outer loop. We have 

^(n,oo) ^ jjij) / (n,oo) ^ T* (g - Tn("'°°))). 

By the weak convergence of the double-minimization algorithm, also the minimum solution 
u* satisfies a similar relation, 

u* = ui'Pju* + T*{g-Tu*)). 
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Recall that Uv'^}j{y)\ := (1 + oj\) '^s'f^{y\). By non-expansiveness of s'^'^ f Lemma 14. 4() we 
have 

= (1 + Loxr%P - ylh + mitjylx - ui'iliynxh 

< (1 + Loxr'wi - T*T\\ - ulh + rS,jy*)A - c/it(2/*)Aii2- 

This implies 

II^K-) _ < (1 - (1 + co.r^I - T*T\\)-' \\U%Jy*)x - Ui^liy^xh 
= {l+cox-\\I- T*T\\)-' (yl) - S.. 

Recall from Lemma l4. II that si'^\2){x) = x — where Pj^2 denotes the orthogonal 

projection of x onto the ^^-ball of radius 11/2. By Lemma 15.21 we have that for any z G M*^ 

So si'^\z) is also Lipschitz in v. Let us recall that 

{1 II (".0) II II (n.O) II ^ r,^ 
0, otherwise . 

and 

px-^\K\\q, \\ui\u<2expx 

0, otherwise . 



By distinguishing cases we can show that 



\ M * I ^ 

l-A --aI<^ 



1 



I II *|| 

Fa II? ~ IFaII? 



^ 1 II (".0) *|| ^ ^11 (n,0) *|| 



where i? = 1 for q £ {2, 00} and R = M for g = 1. Pasting the pieces together yields 

II (n,oo) *|| ^ 'pQ II (",0) *|| ^ aw {n,0) * n 

\\u\ — ?i\ 2 < — : \, ^ F\ 2 < P\\U\ lix 2- 

" ^ ^" - 4:6xiu;\ + l-\\I -T*T\\)" ^ ^" " ^ ^" 

Summation over A G A completes the proof. □ 

Let us combine the previous two results to obtain the error estimation for the finite 
algorithm, i.e., for L„ < 00. 

Theorem 5.5. Make the same assumptions as in Propositions 15. j) and \5.4\ Choose Ln 
such that 

5^ := + +^) < 5 < 1 forallneN. 

(This is possible since a,(3 < 1). Then we have linear convergence of our algorithm, i.e., 

-U*||2 < ^nll-U^"-!'") -U*||2. 
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Proof. Using Proposition 15. II and Proposition 15.41 we get 



("'°)-n*||2 < ||n("'°)-u("-i'°°)||2 + ||n("-^'°°)-n*||2 

< a^"-i ||u("-^'°) - II2 + /3||n("-i'°) - u* II2 

< a^"-i ( - u*\\2 + lln* - tx^^-i'^^lb) + - n*||2 



< a^"-i - n*||2 + /3||n("-^'°) - 'u*||2j + /?||n("-i'°) - u*||2 

This concludes the proof. □ 

Remark: The last theorem shows that it is possible to choose the number L„ of inner 
iterations constant with respect to n. 

5.3 Strong convergence of the double-minimization algorithm 

Finally, we can establish the strong convergence of the double-minimization algorithm and 
conclude the full proof of Theorem 13. II 

Corollary 5.6. Under the assumptions of Proposition 15.41 if the minimizer of J{u,v^"^) 
for fixed v^'^^ could be computed exactly, i.e., -L„ = 00 for all n G N, then the outer loop 
converges with exponential rate, and we have 



u 



-n*||2 < -n* 



where we have denoted here Moreover, the sequence v^""^ converges compo- 

nentwise and v^"'^ — V* converges to strongly in £2,e{-^)- 

Proof. The first part of the statement is a direct application of Proposition 15.41 It remains 
to show that f — v* converges to strongly in £2,e{-^)- Using that all norms on M^^ are 
equivalent it follows that 

AeA AeA AeA 

< CY,hl-u^x^\\l = C||n*-n(")|£2(A,M^^)f. 
AeA 

Thus, v^"'^ — converges also strongly in ^2,6»(A)- D 

6 Color image reconstruction 

With this section we illustrate the application of the algorithms for color image recovery. 
The scope is to furnish a qualitative description of the behavior of the scheme. In a sub- 
sequent work we plan to provide a finer quantitative analysis in the context of distributed 
compressed sensing 

We begin by illustrating an interesting real-world problem occurring in art restoration. 
On 11*^ March 1944, a group of bombs launched from an Allied airplane hit the famous 
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Italian Eremitani's Church in Padua, destroying it together with the inestimable frescoes 
by Andrea Mantegna et al. contained in the Ovetari Chapel. Details on "the state of the 
art" can be found in j2S| l27j. In 1920 a collection of high quality gray level pictures of these 
frescoes has been made by Alinari. The only color images of the frescoes are dated to 1940, 
but unfortunately their quality (i.e., the intrinsic resolution of the printouts) is much lower, 
see Figure IHl Inspired by the fresco application, we model the problem of the recovery of 
a high resolution color image from a low resolution color datum and a high resolution gray 
datum. We will implement the solution to the model problem as a non-trivial application 
of the algorithms we have presented in this paper. 

6.1 Color images, curvelets, and joint sparsity 

Let us assume that the color images are encoded into YIQ channels. The Y component 
represents the luminance information (gray level), while I and Q give the chrominance 
information. Of course, one may also choose a different encoding system, e.g., RGB or 
CMYK. Clearly, the color image / = (/i,/2,/3) can be represented as a 3-channel signal. 
In order to apply our algorithm, we need to fix a frame for which we can assume color 
images being jointly sparse. 

It is well-known that curvelets ^ are well-suited for sparse approximations of curved sin- 
gularities. A natural image can in fact be modelled as a function which is piecewise smooth 
except on a discontinuity set, the latter being described as the union of rectifiable curves. 
Moreover, there are fast algorithms available for the computation of curvelet coefficients of 
digital images 

In the following, let us assume that a color image / is encoded into a vector of curvelet 
coefficients (^ii)^eA^'^- image can be reconstructed by the synthesis formula 



where {ij^x : A G A} is the collection of curvelets. The index A consists of 3 different 
parameters, A = (j,p, fc), where j corresponds to scale, p to a rotation, and k to the spatial 
location of the curvelet We do not enter in further details, especially of the discrete 
and numerical implementation, which one can find in [HI EI- 

Let us instead observe that significant curvelet coefficients ux = {u\^u\^u^ will appear 
simultaneously at the same A E A for all the channels, as soon as the corresponding curvelet 
overlaps with a (curved) singularity (appearing simultaneously in all the channels), and is 
approximately tangent to it. This justifies the joint sparsity assumption for color images 
with respect to curvelets. 

6.2 The model of the problem 

The datum of our problem is a three-channel signal g = (51, ffa) £ ^2(^Ar(,7 1^^) where gi, 
i = 2, 3, are the low resolution chrominance channels I and Q, and gi is the high resolution 
gray channel Y. We assume that g was produced hy g = Tu where u = {v} ^u^ ,v?) are the 
curvelet coefficients of the three channels of the high resolution color image that we want 
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Figure 1: Left: The low-resolution color image is here presented after Gaussian filtering 
and downsampling. In the numerical experiments the I and Q channels are used. Right: 
The high resolution gray level image encodes several morphological information useful to 
recover the high resolution color image. 

to reconstruct. The operator T = (T£j)^j=i,2,3 can be expressed by the matrix 

/ F \ 
T = i AF . (42) 
\ AF J 

Here, A is the linear operator that transforms the high-resolution image into the low reso- 
lution image. In particular, A can be taken as a convolution operator (with a Gaussian for 
instance) followed by downsampling. Eventually, we may assume a suitable scaling in order 
to make ||r|| < 1, and a different weighting of the gray channel and the I,Q channel in the 
discrepency term. Since A is not invertible, also the operator T is not invertible, and the 
minimization of T{u) requires a regularization. Clearly, for this task we use the functional 
K defined in Q) or J defined in ©. 

6.3 On the choice of the parameters 

What remains to clarify is the choice of the parameters lox, Ox, and px. The parameter 
•^A ^ 7 > has been introduced for the sole purpose to make J strictly convex. A large 
value of this parameter actually produces an image u which is significantly blurred and no 
information about edges is recovered. Thus, we rather put = 7 = e > small. Due 
to the convexity requirements (see Subsection 2.3), we select Ox ^ The choice of px 
requires a deeper understanding of the information encoded by the curvelet coefficients. 

Indeed, in it was observed that those curvelets that overlap with a discontinuity 
decay like \\ux\\q < 2~^^'^^ while the others satisfy ||nA||g ^ 2~^l'^^ (where j denotes the 
scale). Since we want to recover joint discontinuities we may choose px ■= Pj,p,k ~ 2~^'^ 
with s £ [3/4, 3/2]. By this choice and by (|12|) the locations A for which vx = will indicate 
a potential joint discontinuity. 

Of course, this is just one possible choice of the parameters and further information 
might be extracted from the joint sparsity pattern indicated by v, by the use of different 
parameters. We believe that a deeper study of the characterization of the morphological 
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■-q=1 

q=2 
- q=«> 



Number of iterations 



Figure 2: The £2 error between the original color image and the iterations of the algorithm 
is shown for different values of g = 1,2, 00. We have considered = 105 and rimax = 1, 
the numbers of inner and outer iterations respectively. We have fixed here ujx = 0, 6x = 10, 
and PA = 20 X . 

properties of signals encoded by frames (e.g., curvelets and wavelets) is fundamental for the 
right choice of these parameters. We refer to |301 131j for deeper insights in this direction, 
concerning fine properties of functions encoded by the distribution of wavelet coefficients. 



6.4 Numerical experiments 

According to the previous subsections, we illustrate here the application of JOINTSPARSE 
for the recovery of a high resolution color image from a low resolution color datum and a 
high resolution gray datum. In Figure ^ we illustrate the data of the problem. In this case 
the resolution of the color image has been reduced by a factor of 4 in each direction by 
using a Gaussian filter and a downsampling. We have conducted several experiments for 
different choices of g € {1, 2, 00}, with fixed parameters as indicated in Subsection 16.31 We 
have chosen L„ = 105 and n-max = 1, as well as Ln = 7 and nmax = 15 (the numbers of inner 
and outer iterations, respectively). In the first case, only the minimization of J(u, v^^^) with 
respect to u has been performed, i.e., no iterative adaptation of the joint sparsity pattern 
indicated by v occurred. In order to estimate the different behavior depending of the pa- 
rameters above, we have evaluated at each iteration the ^2-error between the reconstructed 
I and Q color channels and the original I and Q color channels. Figures [21 and El indicate 
that the error decreases for increasing values of q. This means that the increased coupling 
due to the (7-parameter is significant in order to improve the recovery. Recall that the choice 
q = 1 does not induce any coupling between channels. 

This coupling effect due to (7 > 1 is even more evident in Figure El where the adaptation 
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Figure 3: The ^2 error between the original color image and the iterations of the algorithm 
is shown for different values of ^ = 1,2, 00. We have considered Ln = 7 and TT-max ~ 

15, ttie 

numbers of inner and outer iterations respectively. We have fixed here uix = 1/20, 9\ = 10, 
and px = 20x 

of the weight v occurs. The left and the central pictures in the second row of Figure El 
show a reduced color distortion at edges, passing from the case q = I (without coupling) to 
the case q = oo respectively, and consequently a better edge resolution. Nevertheless, the 
differences are not so remarkable. This is due to the fact that, although the functional J 
promotes coupling at edges, it does not necessarily enforce a significant edge enhancement. 
Thus, we may modify the functional by adding an additional total variation constraint on 
the I and Q channels: 

Jxv('"> "f) '■= J{'^-, v) + (|Fu^|tv + I-^^^Itv) • 

The effect of this modification is to promote edge enhancing together with their simultaneous 
coupling through different channels. For the minimization of J^V ^ heuristic scheme 

as in 25^, by alternating iterations for the minimization of J and for the minimization of 
(|Fii^|rpY + \Fv?\^y)^ compare also ^UJ- The corresponding results are shown in Figure 
m where the effect of the coupling (for the cases q = 2,oo) is further enhanced. The right 
picture in the second row of Figure [S] shows the result of the reconstruction in this latter 
case. The edges are perfectly recovered. 

These numerical experiments confirm that the use of the joint sparsity measure ^'^'^^ 
associated to the curvelet representation can improve significantly the quality of the recon- 
structed color image. Better results are achieved by choosing q = oo and by the adap- 
tive choice of weights as indicators of the sparsity pattern. Further improvements can be 
achieved by channelwise edge enhancing, e.g., via total variation minimization. An appli- 
cation to the real case of the art frescoes is illustrated in Figure 
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Figure 4: Ttie £2 error between the original color image and ttie iterations of the algorithm 
is shown for different values of (7 = 1 , 2 , 00 . We have considered Ln ~ and f^-max ~ 

15, the 

numbers of inner and outer iterations respectively. We have fixed here ujx = 1/20, 9x = 10, 
and Pa = 20 X 2^^ . Further iterations of TV minimization are added in the outer loop to 
enforce edge enhancing. 

7 Final Remarks 

1. If the index set A is infinite then T*T is represented as a biinfinite matrix and thus 
its evaluation might not be exactly numerically implementable. In a subsequent work we 
will consider the case ^^A = 00 and the treatment of sparse (approximate) evaluations of 
biinfinite matrices in order to realize fast and convergent schemes also in this situation, 
compare also [SSI • 

2. To exploit the optimal performance of the scheme, an extensive campaign of numerical 
experiments should be conducted in order to further refine the choice of parameters. It is also 
crucial to investigate the deeper relations among the parameter px, the multifractal analysis 
as, e.g., in and morphological image analysis. In particular, the parallel between the 
functional J and the F-approximation of the Mumford-Shah functional by Ambrosio and 
Tortorelli is suggestive: 

Feiu,v) := I {u-gfdx+ / v^f{Vu)dx+ j e\Vu\^dx+ I \e\Vv\^ + ^{l - vf] dx, 
Jn Jn Jn Jn \ 4e J 

' V ' ^ V ' ^ V ' ^ V ' 

~r(u) ~Ea'"aII«II<7 ~EA'^Al|M|li ~EA^A(pA-fA)2 

where / is a suitable polyconvex function, e.g., f(Vu) = (|Vnp + \ux x Uy\) = (|Vup + 
|adj2(Vu)|), adj2(^) is the matrix of all 2 x 2 minors of A. The minimization of this 
term enforces that derivatives of different channels are large only in the same directions. 



35 



Figure 5: First row. Left: Portion of the low resolution color image. Center: Portion of 
the reconstructed color image by Gaussian interpolation and substitution of the Y channel 
with the gray level datum. Evident color artifacts appear at edges. Right: Portion of the 
original color image. Second row. Left: Portion of the reconstructed color image for q = 1, 
Ln = 105, and nmax = 1- Center: Portion of the reconstructed color image for q = oo, 
Ln = 7, and nmax = 15- Right: Portion of the reconstructed color image for q = oo, L„ = 7, 
'^max = 15, and TV minimization. 

According to the specific choices of p\ to indicate the discontinuity set of u, and for wa = e 
and 9x = we may investigate the behavior of the functional J for e ^ and its 
relation with the Mumford-Shah functional. The term Y^^^ ^(pa — vx)'^ essentially counts 
the number of curvelets that, from a certain scale j on such that ~ e, do overlap with 
the discontinuity set and are nearly tangent to the singularity. We conjecture that for e ^ 
and for a rectifiable curved discontinuity, this term estimates the length of the discontinuity. 

3. While we were finishing this paper, we have been informed by G. Teschke of the 
results in |19j . In this manuscript the authors consider linear inverse problems where the 
solution is assumed to fulfill some general 1-homogeneous convex constraint. They develop 
an algorithm that amounts to a projected Landweber iteration and that provides an iterative 
approach to the solution of this inverse problem. In particular for the case lo = {uj\)\ = 0, 
some of our results stated in Section 4 can be reformulated in this more general setting and 
therefore derived from j^H]- However, for to ^ the sparsity measure as in (|2(ij) is not 
1-homogeneous and the elaborations in Section 4 are needed. Moreover, for the relevant 
cases q = l,2,oo, we express explicitly the projection P^^2- -^^^ their generality, the 
results in do not provide concrete recipes to compute such projections. 
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Figure 6: Left: Low quality color image of the fresco dated to 1940. Center: High quality 
gray image of the fresco dated to 1920. Some details are not visible in the color version. 
Right: The reconstructed image after 6 outer iterations with 7 inner iterations each, for 
q = oo. The final Y channel is substituted with the high resolution gray level datum. The 
discountinuities are enhanced and no artifact colors appear. 

8 Conclusion 

We have investigated joint sparsity measures with respect to frame expansions of vector 
valued functions. These sparsity measures generalize approaches valid for scalar functions 
and take into account common sparsity patterns through different channels. We have an- 
alyzed linear inverse problems with joint sparsity regularization as well as their efficient 
numerical solution by means of a novel algorithm based on thresholded Landweber iter- 
ations. We have provided the convergence analysis for a wide range of parameters. The 
role of the joint sparsity measure is twofold: to tighten the characterization of solutions 
of interest and to extract significant morphological properties which are a common feature 
of all the channels. By numerical applications in color image restoration, we have shown 
that joint sparsity significantly outperforms uncoupled constraints. We have presented the 
results of an application to a relevant real-world problem in art restoration. The wide range 
of applicability of our approach includes several other problems with coupled vector valued 
solutions, e.g., neuroimaging and distributed compressed sensing. 
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